To convert from a fastq to a fasta file you will need to remove the quality header and quality score line and replace the @ preceding the sequence header with a >.
Fasta File:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT
Python:
myFastq =open('myfile.fastq','r')#open fastq file for reading
myFasta =open('myfile.fasta','w')#open fasta file for writingwhile1: #initiate infinite loop#read 4 lines of the fasta file
SequenceHeader= myFastq.readline()
Sequence= myFastq.readline()
QualityHeader= myFastq.readline()
Quality= myFastq.readline()if SequenceHeader =='': #exit loop when end of file is reachedbreak#write output
myFasta.write('>%s%s' %(SequenceHeader.strip('@'), Sequence))#close files
myFastq.close()
myFasta.close()
Bash:
#grep for all sequence header lines and following line (-a 1) in your fastq file. Delete separator ('--') introduced#by grep search. Replace @ with >. The '|' character pipes the output from the previous command into the following#command. The grep search relies on the 'EAS' being common the all sequence headers in your fastq file.grep-A1'@EAS' myfilefastq |sed'/--/d'|sed's/@/>/'> myfile.fasta
Subset File
Here is an example for sub-setting a fastq file containing 1000 sequences into 10 fastq files containing 100 sequences each.
Bash:
#Loop over the range of files you need to generate (1000/100 = 10).#Create a variable j that keeps track of how many lines you have processed.#Pipe (|) the top j lines (head -n) from you file to the tail command to grab the last 100 lines (tail -n 100).#Redirect (>>) the lines grabbed by tail into a new file.for((i=1; i<=10; i=i+1)); doj=$[$i*100]; head-n$j myfile.fastq |tail-n100>> new_$i.fastq; done
Replace Mac Line Breaks
Bash:
cat yourfile |tr'\r''\n'
Remove Line Breaks from Sequences
Here's how to get a sequence with line breaks onto the same line.
Line Breaks:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCAT
CATCATCATCATCAT
CATCAT
No Line Breaks:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT
This approach will work for fasta files and will require some modification for fastq files.
Python:
myFasta =open('myfile.fasta','r')#open fasta file for reading
NewFile =open('sameline.fasta','w')#open new fasta file for writing
line = myFasta.readline()#read first line in fasta filewhile line: #loop over lines in fasta file
NewFile.write(line)#write header line to new file
sequenceList =[]#initiate empty list for storing sequence lines
line = myFasta.readline()#read next line from fasta filewhile line andnot line.startswith('>'): #loop over sequence lines
sequenceList.append(line.strip('\n'))#strip line break from line and append to sequenceList
line = myFasta.readline()#read next line from fasta file
NewFile.write('%s\n' % ''.join(sequenceList))#write sequence to new file#close files
myFasta.close()
NewFile.close()
Fastq to Fasta File | Subset File | Replace Mac Line Breaks | Remove Line Breaks from Sequences
Fastq to Fasta File
To convert from a fastq to a fasta file you will need to remove the quality header and quality score line and replace the @ preceding the sequence header with a >.Fastq File:
@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT
+
BBBBCCCC?<A?BC?7@@???????DBBA@@@@A@@
Fasta File:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT
Python:
Bash:
Subset File
Here is an example for sub-setting a fastq file containing 1000 sequences into 10 fastq files containing 100 sequences each.Bash:
Replace Mac Line Breaks
Bash:
Remove Line Breaks from Sequences
Here's how to get a sequence with line breaks onto the same line.Line Breaks:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCAT
CATCATCATCATCAT
CATCAT
No Line Breaks:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT
This approach will work for fasta files and will require some modification for fastq files.
Python: